Skip to content

Conversation

@JoeOsborn
Copy link
Contributor

@JoeOsborn JoeOsborn commented Mar 2, 2025

I got obsessed with this problem (discussed in #23668 , #23021) and implemented the mapfs manifest-mapped filesystem idea. From the header:

Mapping backend

Creates a new path mapping backend based on a provided manifest,
which is a mapping from file paths (relative to where the mapfs is
mounted) to absolute file paths This is useful if your C program
expects files at certain locations, but might not handle symlinks
properly. It can also be combined with e.g. fetchfs, in case your
web server has files at several different paths but the C program
wants to see all the files within one directory. The manifest can
be built using wasmfs_map_create_manifest and
wasmfs_map_add_to_manifest. When the mapfs is mounted, directories
and files corresponding to the manifest entries will be
automatically created. Importantly, only individual files can be
mapped through mapfs, not whole directories---a manifest can't map
one directory to another, just one file to another.

For example, a manifest like {'test.txt':'/resources/file.txt'}
mounted at /dat (using either wasmfs_mount or
wasmfs_create_directory) will let you write open("/dat/test.txt", O_RDONLY).

If you have a fetchfs at /fetch/ pointed to server.com and map files from /path/whatever to /fetch/url/path/file, your C code can access /path/whatever in order to lazily fetch the file at server.com/url/path/file. I can also see it being useful for OPFS, in case the OPFS file hierarchy is not what your C code wants for whatever reason.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising! Some initial comments.

$MAPFS__deps: ['$stringToUTF8OnStack', 'wasmfs_create_map_backend', 'wasmfs_map_create_manifest', 'wasmfs_map_add_to_manifest'],
$MAPFS: {
createBackend(opts) {
return withStackSave(() => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The withStackSave can be on the _wasmfs_map_add_to_manifest call. That will reuse the stack each time, allowing a lower stack size to work in more cases.

// University of Illinois/NCSA Open Source License. Both these licenses can be
// found in the LICENSE file.

// This file defines the fetch backend.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This file defines the fetch backend.
// This file defines the MAPFS backend.


namespace wasmfs {

typedef std::map<std::string,std::string> MapManifest;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typedef std::map<std::string,std::string> MapManifest;
using MapManifest = std::map<std::string,std::string>;

(I believe this is more "modern" notation)


typedef std::map<std::string,std::string> MapManifest;

class MapBackend : public wasmfs::Backend {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class MapBackend : public wasmfs::Backend {
class MapBackend : public Backend {

(already in the namespace)

assert(manifest && "Mapfs: null manifest not supported");
}
~MapBackend() {
if(manifest != NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(manifest != NULL) {
if (manifest != NULL) {

std::shared_ptr<DataFile> createFile(mode_t mode) override;
std::shared_ptr<Directory> createDirectory(mode_t mode) override;
std::shared_ptr<Symlink> createSymlink(std::string target) override {
fprintf(stderr, "mapfs doesn't support creating symlinks");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fprintf(stderr, "mapfs doesn't support creating symlinks");
fprintf(stderr, "MAPFS doesn't support creating symlinks");

I think this is more consistent with other text?



class MapDirectory : public MemoryDirectory {
std::string dirPath;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment here. I assume this is either the "true" or the "mapped" path? Also dirPath is perhaps generic - how about truePath or mappedPath?

@JoeOsborn
Copy link
Contributor Author

Thanks! I've addressed these comments and found another place the lowercase mapfs was used instead of MAPFS.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 4, 2025

When a file is mapped (e.g. /path/whatever -> /fetchfs/server.com/url/path/file) what is the relationship between the source the destination file? Is it like a hard link? Or a sym link? Will opening both the source and the destination at the same time result in a file with same inode number?

@JoeOsborn
Copy link
Contributor Author

When a file is mapped (e.g. /path/whatever -> /fetchfs/server.com/url/path/file) what is the relationship between the source the destination file? Is it like a hard link? Or a sym link? Will opening both the source and the destination at the same time result in a file with same inode number?

A "virtual data file" is created, as in the icase backend, which refers to the original file. It is different from icase in that the original file can also be accessed by some path.

What should happen/what bad things could happen/what test cases should be added to explore the possibility of the file being opened under two different names?

@sbc100
Copy link
Collaborator

sbc100 commented Mar 4, 2025

When a file is mapped (e.g. /path/whatever -> /fetchfs/server.com/url/path/file) what is the relationship between the source the destination file? Is it like a hard link? Or a sym link? Will opening both the source and the destination at the same time result in a file with same inode number?

A "virtual data file" is created, as in the icase backend, which refers to the original file. It is different from icase in that the original file can also be accessed by some path.

What should happen/what bad things could happen/what test cases should be added to explore the possibility of the file being opened under two different names?

Would symlinks be a simpler/quicker solution?

Should we add hard link support maybe?

@JoeOsborn
Copy link
Contributor Author

We thought about symlinks before, but this will work even if the C code doesn't do the right thing wrt symlinks. One could argue that such C code should be fixed instead of having a new mapping backend, but it becomes complicated with e.g. whether writes through remapping should modify the location of the symlink or the underlying target location (perhaps not relevant for the fetch backend but maybe relevant in other settings).

@sbc100
Copy link
Collaborator

sbc100 commented Mar 4, 2025

We thought about symlinks before, but this will work even if the C code doesn't do the right thing wrt symlinks. One could argue that such C code should be fixed instead of having a new mapping backend, but it becomes complicated with e.g. whether writes through remapping should modify the location of the symlink or the underlying target location (perhaps not relevant for the fetch backend but maybe relevant in other settings).

Fair enough, symlink trees are not the same as normal files.

How about hard links? @tlively @kripken did you even think of adding support for the same file existing in multiple places? (or more than one file having the same inode number?)

@tlively
Copy link
Member

tlively commented Mar 5, 2025

No, I don't think we ever seriously considered supporting hard links. Backends like OPFS would not be able to support them because they could not be reflected in the underlying storage. I don't have enough of the system paged into memory to say what other specific problems might arise from allowing files to be reachable from multiple paths, but I am very wary of the idea.

Does this backend make the same file visible from multiple paths, or does it virtualize and provide a mapped view into some underlying backend that is not directly mounted into the file system? I couldn't tell from a quick look at the code. I would strongly prefer the latter design to avoid making a file directly accessible from multiple paths.

@JoeOsborn
Copy link
Contributor Author

This does the former, but I think it’s a small change to make it do the latter:

  1. Take a backend parameter
  2. Use createFile / createDirectory on the inner backend at the same time as they are created on the mapfs backend.

This would make it more like the icase backend, so I’ll give that a shot over the next day or two.

@JoeOsborn
Copy link
Contributor Author

So I think it's not actually going to be possible with the current API to define mapfs as a front to another, non-mounted filesystem, especially if the wasmfs_create_file API is going away, because intermediate directories &c may need to be created in the non-mounted filesystem. The backend would need specialized API that takes both a real path on the wasmfs filesystem and a virtual path on the mapped-to filesystem backend. Regular open(O_CREAT)/mkdir type calls won't work to create files in mapfs without a much more sophisticated mapping than a simple manifest.

Not every filesystem backend allows writes, and already "creating" a file in the fetch backend (for example) is an awkward idea (since files are readonly)---moreover, doMkdir and doOpen also "mount" the created files on the filesystem, so the syscalls both create and mount. So I believe the best solution is to withdraw mapfs and require that C programs handle symlinks properly, and also make fetchfs automatically create files on open (even without O_CREAT)---but even that might not actually be possible, since getChild() can't know if it's getting a data file, directory, or something else.

So as I see it, the manifest makes sense as part of fetchfs (mapping filesystem paths and URLs, designating actually accessible files) but has big question marks when implemented as a standalone filesystem. So I think a standalone mapfs has too many rough edges and shouldn't be implemented; either fetchfs should use a manifest (and avoid the need for "creating"/opening fetched files) or we should just require that C programs do the right thing with symlinks.

@tlively
Copy link
Member

tlively commented Mar 7, 2025

Thanks for that analysis, @JoeOsborn. It sounds right to me. I would be fine with adding manifests to fetchfs if there is demand for it, but I would also hope that most programs handle symlinks correctly.

@JoeOsborn
Copy link
Contributor Author

I’ll close this as a successful experiment.

@JoeOsborn JoeOsborn closed this Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants